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“WHICH REGRESSION ?* 


CHARLES P. WINSOoR 


The statistician has often to deal with the problem of fitting regres- 
sions when errors of measurement are present in one or both of the 
variables. Occasionally, some question arises as to which regression 
line is appropriate. Although this problem has been dealt with before, 
a good deal of confusion appears to exist, and occasional errors or mis- 
leading statements appear in textbooks. A very elementary presen- 
tation of a particularly simple case is offered here in the hope that it 
may be helpful to the biometrician. 

The case which we consider is the following. A pair of variables 
u, v has a bivariate normal distribution in the general population, with 
variances 6,7, 6,” and correlation ?,,. Our measurements of u and v 
are subject to error. We assume that these errors are independent, 
unbiased, and normally distributed. We actually record, then, indi- 
vidual measurements 

y=vrte, 
where 5, € are independent normal deviates with means zero and vari- 
ances 9;", 0,2. We are dealing, that is, with a case in which both errors 
of measurement and ‘‘organic variation’’ are present. 

It is easy to see that in the general population the z, y measure- 
ments will be normally correlated ; and the following relations can be 
shown to hold. 

For the variances we have 

6,2 = Oy? + 052; 0,? = 0,2 + 6,2, 

The correlation between xz and y is 
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The regression slopes in the general population are 
= Puv; 6,., = Puy. 

Suppose now that we have a sample set of pairs of values of x and y. 
Which regression should we use, that of y on z, or that of x on y? 

This question is meaningless as it stands. Before it can be an- 
swered, we must know (1) how the (x, y) values were obtained and 
(2) what we are going to use the regression for. 

As to (1), there are two common situations. 

(a) The pairs of (¢, y) values were obtained as a random sample 

from the general population. 

(b) The (x, y) values were obtained by selecting a set of values 

of one variable, say z, and subsequently measuring the corre- 


sponding values of y. 
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Figure 1. Left: Situation (a); zx, y pairs randomly sampled. 
Right: Situation (b) ; x values arbitrarily chosen. 


These two situations are indicated in the two panels of Figure 1. 
As to (2), our proposed use of the regression line, there are four 
more or less usual possibilities. 

(i) We may want a relation from which we can estimate in the 
future, the value of y, given a future measurement z. 

(ii) We may want a relation from which to estimate z, given a 
future measurement y. 

(iii) We may want a relation for estimating the true value v, given 

a future measurement x. (Or, u given y.) 
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(iv) We may wish to estimate the true relation between the true 
values u and v. 

We shall consider first situation (a). Here, since our (2, y) pairs 
are a random sample from the population, we can estimate all the con- 
stants of the (z, y) distribution. With this information available, it 
is clear that we shall want the regression of y on x in problem (i), and 
that of x on y in problem (ii). This is, in fact, the classical bivariate 
normal regression problem, and is quite unaffected by the fact that 
errors of measurement are involved. 

In problem (iii), it is easy to see that the regression of the true 
value v on the measurement x will be the same as that of the measure- 
ment yon z. This is so because y differs from v only by random and 
unbiased errors. Our best estimate of v will therefore be obtained 
from the regression of y on z. 

For problem (iv), we need more information than is obtainable 
simply from a sample of (z, y) pairs. If we are trying to estimate the 
relation between the true values u and v, we need, in addition to the 
(x, y) pairs, estimates of the error variances 9,7, 9,2. (In some cases 
these are obtainable by duplicate measurements; but this is not always 
true.) 

The physicists, in dealing with this problem, generally assume that 
their true values are perfectly correlated (functionally related), and 
that only errors of measurement are responsible for observational 
scatter. The computational techniques appropriate to this case are 
given fully by Deming (1943). The more general situation, where 
uw and v may have any degree of correlation, has long been of concern 
to the psychologists. Spearman’s ‘‘correction for attenuation’’ is one 
attempt to deal with it. 

We now turn to situation (b), in which we selected a set of x values 
and measured the y’s corresponding. Here we can obtain an estimate 
of the regression of y on x, and of the variance, 4,.,?, of y around the 
regression line. We cannot, however, estimate the population values 
of Z, Y¥, %2”, Pry, nor can we estimate the population regression of x on y. 
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Thé question is no longer ‘‘ Which regression should we use?’’ but 
‘“What can we do with the single regression we have?’’ 

Of the four problems previously considered, no difficulty arises in 
(i) or (iii), since in each of these the regression of y on x is required. 
We shall not consider problem (iv), which is obviously intractable 
under this situation. There remains for consideration problem (ii), 
to establish a relation for predicting x given y. 

A eareful discussion of this problem has been given by Eisenhart 
(1939) in terms of confidence intervals. We shall endeavor here to 


Distribution 


ef y for fixed x 


Figure 2. Population regression, y on z, and distribution of y 
about regression line. 


concentrate on the fundamental problems of inference, avoiding the 
complications which arise out of the finite size of our samples. Let us 
assume, then, that we have been given an infinitely large sample, so 
that we know the exact regression of y on x, and the exact variance of 
y around the regression line, in the population. We do not know 
anything else. Suppose now a new (2, y) pair is taken from the 
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population and we are informed of its y value; what can we say about 
its x value? 


At this point a diagram will be helpful. 
We are given a regression line 
(where we introduce the symbol y to distinguish the regression value 
from the individual sample values) ; and (since we are assuming nor- 
mality) we know that for any assigned value of 2, the values of y are 


y 


455, 


Figure 3. Confidence limits for y given z. 


normally distributed with mean 

y=a+bez 
and variance %,.,”._ We can, therefore, given any 2, state the. proba- 
bility that the corresponding y shall fall within any assigned limits. 


In particular, for example, we can state that the probability that y lies 
in the interval 
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a+ bx —-1.960,., to a+ bx + 1.960,., 
is .95; and by proper choice of the coefficient of ¢,.,, we can make cor- 
responding statements for any other level of probability. 

Again a diagram will be helpful. 

We draw the regression line of y on z, and above and below it we 
draw lines at a vertical distance of 1.960,.,. The probability that y 
falls between these two lines is .95 for every x value; and is therefore 
.95 for the totality of all possible y values. If, then, we consider all 
possible pairs of (z, y) values, we see that 95% of them will lie inside 
the strip which we have constructed; and that accordingly we can 
assert, given a value of y, that the corresponding z lies within the strip, 
and that this assertion will be true, over all possible cases, 95% of the 
time. 

We can express all this algebraically. We can set up the double 
inequality 

a+br—tpo,.9< y (A) 
with tp properly chosen, and assert that the probability that this in- 
equality is satisfied is P. Algebraically, we can rearrange this inequal- 
ity to read as an inequality on z, and obtain 


1 1 
(y-a- tpOy.2) 5 —a+tpo,..). (B) 


Before we allow our algebra to run away with our judgment, it will 
be well to consider more closely the exact meanings of (A) and (B). 
With regard to (A), we observe that it is a statement about the random 
variable y, which involves a fixed but arbitrary value of x. The state- 
ment (A) has probability P of being true for any such arbitrary zx. It 
has therefore the same probability P of being true for the aggregate 
of all values of z. 

Consider now statement (B). -If the random variable in this state- 
ment is y, then (B) and (A) are completely equivalent in meaning; 
but this is not what we really want. We should like to interpret (B) 
as a probability statement about an unknown z in terms of an observed 
y. But since (A) holds with probability P for the totality of all pos- 
sible pairs (x, y), (B) will also hold with the same probability over 
all possible pairs. If, then, we make the assertion (B) about every 
(x, y) pair we draw, the assertion will be true with probability P— 
that is, a proportion P of our assertions will be true in the long run. 

We cannot, however, say that in each particular case, or with 
respect to each particular y value, the assertion has probability P. In 
fact, it is easy to see that this is not true. For suppose that, knowing 
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the true regression, we set up our limits on x, in the form (B). And 
now suppose that in a long series of subsequent samples the value of x 
were (unknown to us) fixed and equal to x, say. The situation would 
then be that shown in Figure 4. 


Arby, 
atby - 


X= x, 


Figure 4. Illustrating meaning of inversion of confidence limits. 


In this situation, whenever y falls outside the limits 
a+bx + tpo,., (C) 
our assertion (B) is false (it asserts that z lies within limits which do 
not in fact include the true value z)) and whenever y falls inside the 
limits (C) our assertion (B) is true. Over the aggregate of all y 
values obtained, assertion (B) has probability P of being true, though 
for each particular y value it is either always true or always false. 
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The case we have just considered is an extreme one, chosen to illus- 
trate the point. It corresponds to our initial assumptions about nor- 
mal correlation if we make o,” zero. Figure 5 illustrates the situation 
for the general case of the bivariate surface. Here we have drawn 
both regression lines, that of y on x and that of x on y, and both sets 
of confidencé limits. It will be noticed, first, that the two sets of con- 
fidence limits do not coincide, and second, that the direct limits are 


y+ 


Figure 5. Regression lines and confidence limits, bivariate normal, 
Zz, y pairs randomly sampled. 


narrower than the inverted limits. The advantages of using the direct 
regression when it is available are clear. 

So far we have considered the case of an infinitely large sample. 
In practice our samples are of finite size, and accordingly our estimates 
of the regression line and of the variance around it are subject to 
sampling error. This results in more complicated algebra; in par- 
ticular the confidence limits become hyperbolas, and in general we have 


108 


| 
: { tc 
ey o 
to, 
> 
= 
P 
/ 
> 
4 
3 


to face minor but somewhat troublesome complications of computation. 
Again, reference may be made to Eisenhart, where explicit formulae 
are given. 

We may perhaps point out that this inversion of the regression line 
and the confidence limits is often the only availablé solution to the 
regression and estimation problem. For illustration we need only 
point to the case of biological assay. Here we are attempting to esti- 
mate the potency of an unknown in terms of a biological response, with 
the aid of a response curve based on known doses of a standard. The 
only possible regression line is that of response on dosage; even the 
notion of a population distribution of x values (potencies) becomes so 
vague as to be meaningless. 

Our general principle, it appears, should be: if~t is possible and 
meaningful, arrange the experiment so that the desired regression 
ean be determined directly. That is, the variable from which predic- 
tion is to be made should be taken as the independent variable. In 
those numerous situations where this is not possible, use the inverted 
regression. 
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AN APPROXIMATE DISTRIBUTION OF ESTIMATES 
OF VARIANCE COMPONENTS 


F. E. SATTERTHWAITE 
General Electric Company, Ft. Wayne, Indiana 


1. INTRODUCTION 


In many problems, only simple mean square statistics are required 
to estimate whatever variances are involved. If the underlying popu- 
lations are normal, these mean squares are distributed as is chi-square 
and may therefore be used in the standard chi-square, Student’s ¢t and 
Fisher’s z tests. Frequently, however, the variances must be esti- 
mated by linear combinations of mean squares. Crump (1) has 
recently discussed a problem of this type, based on the following data: 


ANALYSIS OF VARIANCE OF TOTAL EaG PRODUCTION OF 12 FEMALES 
(D. melanogaster) FRoM 25 RACES IN 4 EXPERIMENTS 


Average..Value 
Source of Degrees of 
* ate Mean Square of the 

Variation Freedom Mean Square 
Experiments 3 MS, = 46,659 o,? + 12 Ser* + 300 o,? 
Races 24 MS, 3,243 + 12 + 4 o,? 
ExR 72 MSer= 459 o,7 +12 ¢¢,* 
Within Subclasses 1,100 MS, = 231 o,7 


The variance of the mean of the i th race is shown in his paper to 
be estimated by 


(1) (6,? + 6,,?) (9,7) 
1 |MS.-MS., MSer--MS,\ 1 
er er z MS. 
e | 300 300 n 12 


where e is the number of experiments and n is the number of females 
in each experiment. Variance estimates such as (2) have been called 
complex estimates (2). Thus a complex estimate of variance is a 
linear function of independent mean squares. : 
It is stated in (1) that ‘‘increasing the number of females indefi- 
nitely still leaves us with 
MS, + 24 MS,,-—25 MS, 173 
(3) V (Z.;.) = 300 e 
Conclusions are then reached without analysis of the sampling 
errors involved. Now the standard deviation of V(7Z.;.) is very large 
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\/2 (MS.)? , (24.MS.,)? , (25 MS.)? 
= 0.57 V(Z.i.); 
and further analysis leading to confidence limits for_V(Z.;.) should be 
helpful in choosing a course of action. 

The writer has studied the distribution of complex estimates of 
variance in a paper (2) in Psychometrika. Since this paper may not 
be readily available to biometricians, the principal results are outlined 
below and a few applications are given. 


2. THE DISTRIBUTION OF COMPLEX ESTIMATES OF VARIANCE 


The exact distribution of a complex estimate of variance is too 
involved for everyday use. It is therefore proposed to use, as an 
approximation to the exact distribution, a chi-square distribution in 
which the number of degrees of freedom is chosen so as to provide 
good agreement between the two. This is accomplished by arranging 
that the approximating chi-square have a variance equal to that of 


the exact distribution. If MS,, MS., ... are independent mean 
squares with r,, r2, . . . degrees of freedom and 
(5) V,=a,(MS;) +a.(MS2)+... 


is a complex estimate of variance based on them, the number of degrees 
of freedom of the approximating chi-square is found to be given by 
(6) [a,E(MS,) +a,.E(MS,)+... }? 
where E(_ ) denotes mean or expected values. 

In practice, the expected values of the independent mean squares 
will not be known. The observed values will usually be substituted in 
(6), giving, as an estimate of rs, 

(7) [a,(MS,) +a.(M8,)+... ]? 

[a,(MS,) }? [a.(MS8,) |? 

An approximation of this type for a slightly simpler problem was 
first suggested by H. Fairfield Smith (3). In his problem, there were 
only two mean squares and @,=a,=1. This approximation does not 
support the use of r+2 in place of r as a correction for bias [ (1) 
formula 3]. 

The writer has checked «: accuracy of the suggested approxi- 
mations by calculating the exact distribution for a number of special 
eases. Typical results are as follows: 


111 


. 

: 

| 

|| 

| 

4 


E(MS,) x°(95%) x7(99.9%) 


“E(MS.) 


exact approx. exact approx. 
4 2 4 100/33 7.9 8.0 16.2 17.3 
8 4 1 32/ 3 19.4 19.5 30.5 31.0 
6 4 2 54/ 7 15.1 15.3 26.0 27.2 
20 4 2 180/21 16.2 17.0 27.7 29.0 
4 2 1 16/ 3 11.5 11.7 21.3 22.3 


The above discrepancies between the exact and the approximate 
chi-squares, even for the extreme 99.9 percent case, are very small 
compared with their sampling errors. Thus it appears that the ap- 
proximation may be used with confidence. Furthermore, we know 
from general reasoning that if r, is large, both the approximate and 
the exact distributions approach the same normal distribution; if r, is 
small, the sampling errors in the chi-squares are large and refinement 
is superfluous. 

Some care must be taken in the cases where one or more of the a’s 
in (5) are negative. If it is possible for V, to be negative with a fairly 
large probability, the approximate distribution will become rather poor 
since it can not allow negative estimates: However, here again the 
sampling errors in V, will be quite large compared with its expected 
value so that only the sketchiest of conclusions can be drawn in any 
case. 


3. FurRTHER ANALYSIS OF CRUMP’s EXAMPLE 


The distribution of Crump’s estimate of the residual variance of 
the race means, 


(3) = 300e 
can now be approximated. Thus 
46,659 (24) (459) _ (25) 
(8) 300 ~ 300 300 


1 (155 +37-19) -22 
é 


From (7) we have 


_ [155+87-19]? 


(37)? 


(19)? 


3 72 1,100 
29,929 
~ 8,008+19+1 


From chi-square tables interpolated for 3.7 d.f. at the 5 percent and 
95 percent points we find that, with a high degree of probability, 
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(10) 0.60 < 90 
or 
(11) (3.7) (173) (3.7) (173) _ 
< 
1 061, 


Bev 


Thus if it were necessary to reduce V to ss and if time were im- 
portant so that a second series of experiments could not be made, we 
should run 
(12) 119 
experiments in the first series for confidence that V would be properly 
reduced. On the other hand, if the experiments were expensive and 
time not important, we might run 


(3.7) (178) 
13 e=~ = 18 
(5.6) (9) 
experiments and then get a more accurate estimate of V to determine 
how many additional experiments should be run (5.6 obtained from 


the 20 percent point for chi-square, 3.7 d.f.). 


4. DIFFERENCE oF MEANS 


The usual estimate of variance used in Student ¢ tests for the dif- 
ference of two means is 


fot+1 


with 
(15) 
degrees of freedom. This assumes that both populations have the same 
variance. Seldom do we have positive evidence that this is so and 
often we have evidence that the variances are different. For example, 
F = (MS,)/(MS,) may be significant. Note that a non-significant F 
is not evidence that the variances are equal, especially if one of the 
MS’s has a small number of degrees of freedom. 

The assumption of equal variances can be avoided by use of a 
complex estimate of variance, 

~ MS, MS, 

Ver 
with 


113 


: 
5 
| 
: 
| 
| 
i 
| 


{(MS,/r, +1) ] + [MS82/(r2+1)1}? 
+1) ]? + [MS2/(r2 + 1)]? 


(17) 


degrees of freedom. 
For example, consider the numerical case: 
MS, = 100, r, = 99, 
MS,= 90,r.= 9. 
By the standard analysis one would obtain 


(199) + (9) 20)) +3] - 109 


108 100 10 
r,=99+9=108 
: The complex estimate gives 
~ 100 90_ 
a (19) *= 300° 10 = 10.0, 
99° 9 


One will sometimes reach different conclusions with 108 degrees of 
freedom from those he will reach with 11 degrees of freedom. 

If from general reasoning or other a priori considerations it is be- 
lieved that both MS, and MS, are independent estimates of the same 
variance, then the use of 108 degrees of freedom is justified. On the 
other hand, if the given data are the entire admissible knowledge, then 
the use of more than 11 degrees of freedom is not valid. 


5. CONCLUSION 


In many practical problems the most efficient estimate of variance 
available is a linear function of two or more independent mean-squares. 


: Usually the exact distribution of such estimates is too complicated for 
| practical use. A satisfactory approximation can be based on the chi- 
square distribution with the number of degrees of freedom determined 
| by (7). 


Many problems, such as the difference of means, can be more con- 
servatively analyzed by use of complex estimates of variance. Assump- 
tions regarding homogeneity of variance can then be avoided. 
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PRELIMINARY REPORT ON THE RECTANGULAR 
LATTICES 


Boyp HARSHBARGER 


Virginia Agricultural Experiment Station and 
Virginia Polytechnic Institute 


The theory of the experimental arrangement which is now called 
the lattice had its beginning with Yates in 1936. It has been further 
developed by Yates and Cochran in numerous papers beginning in 1939 
and extending through 1943. The lattice design requires that the 
number of varieties be an exact square. Within each replication, the 
varieties are laid out in incomplete blocks so that each row or incom- 
plete block contains the same number of varieties. The grouping of 
varieties into blocks in the various replications is made according to a 
set of patterns which are so arranged that, in them, no pair of varieties 
occurs together within a block more than once. Each pattern may be 
employed one or more times (usually two). These patterns are re- 
ferred to hereafter as Group X, Group Y, ete. This symmetry makes 
it possible to adjust the varietal total or average yields by simple 
calculations for variations in the fertility among the incomplete blocks. 

To avoid the restriction that the number of varieties must be a 
perfect square, Yates introduced a design in which the blocks vary in 
size from one replication to another. This he called the pseudo-fac- 
torial with unequal groups of sets. However, no attempt was made 
to use inter-block information, and the design proposed by him does 
not conveniently lend itself to such an analysis. 

This paper presents a few of the preliminary results on incomplete 
block designs in which the number of varieties is the product of two 
consecutive integers. The arrangement differs from Yates’ non-square 
design since the blocks are all the same size and the variety means are 
adjusted for both the inter- and intra-block information. It has a 
closer resemblance to the ordinary lattice design than does the pseudo- 
factorial with unequal groups of sets. The name Rectangular Lattice 
is proposed for this design since the word lattice carries no implication 
of squareness. 

The general theory of Rectangular Lattices is to be published in a 
Virginia Agricultural Experiment Station Bulletin which will also 
earry numerical examples of simple and triple rectangular lattices. 
Copies of this bulletin will be available on request. 
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As in the square lattice, the varieties are arranged in two groups, 
X and Y, each of which is replicated as shown below. For simplicity 
of illustration, a 3x4 rectangular lattice is used. The numbers 
designate varieties. 


TABLE I 
Group X 
Blocks Blocks 
(1) 1 2 3 (1) 1 2 8 
(2) 4 5 6 (2 4 5 6 
(3) 7 8 9 (3) 7 8 9 
(4) 10 11 12 (4) 10 11 12 
Group Y 
Blocks Blocks 
(1) 1 4 7 (1) 1 4 7 
(2) 5 8 10 (2 5 8 10 
(3) 2 9 11 (3) 2 9 11 
(4) 3 6 12 (4) 3 6 12 


In practice the varieties are randomized within blocks and the 
blocks are randomized within the replicates. 

For purposes of enumeration and computation the rectangular 
lattice may be thought of as a square lattice with k varieties missing in 
such a way that the missing varieties occur once in each row and once 
in each block. With this arrangement the groups for a k(k-1) 


rectangular lattice (together with more convenient subscripts) are 
shown in Table II. 


TABLE II 
Group X 
Blocks 
(1) Vy 0 Ba 
(2) Vn 0 Vek Bee 
| 
(k-1) Vk-11 | Vi-1,2 Vk-1,3 0 
(k) 0 | Vis k Box 
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Blocks 


(1) Vn Un 0 By 
(2) 0 Use Uk,2 Byz 
(3) Vis 0 Vk,s Bys 
(k) Vik V2,k By x 
Ty Ty. Ty Ty x 


Table II serves a double purpose. The body of the table gives the 
pattern of the arrangement in the blocks of the k(k-—1) varieties, the 
v’s being simply symbols for varieties. However, if each v is regarded. 
as the total of the two observations on a variety over the two replicates 
of a group, then this table is one which is made up in the course of the 
analysis. The marginal totals, the B’s and T’s, are formed on this 
basis. Thus, B,; is the sum of the yields of the varieties which occur 
in the 7‘ blocks of the two replications in Group X. T,; is the sum of 
the yields, in Group X, of the varieties listed in block 7 of Group Y, ete. 

By a rather tedious mathematical process, which is given in detail 
in the Experiment Station Bulletin, the following formulas and equa- 
tions are evolved. f, is the sum for replicate e, An; is the difference 
between blocks within Group h of block i, V; is the sum of variety 7 
from the four replicates. G is the grand total, and y,i; is the individual 
observation. 

The weights are calculated by two simple formulas 


1 -2P) 


w- 5 (3k? —7k—-1)?+4(k-1)? 
and 
1 
We 


where N represents the mean square of component (a) 


Q co 66 the error 
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i TABLE III 


4 ANALYSIS OF VARIANCE TABLE 
Sources of variation 
a/f Sum of squares 
4 G 
Replicate 3 = 
Component 1) y k An? (R, + B,)? 
2(k-1) 2k(k-1) 
Component (b) 1 k : 
4k(k-2)| 


k 
$= 


-2[(2, +B.) (Bs +B.) 17} 
G 
Varieties k(k-1)-1 


Error (residuals) 
3k?-7k+1 by subtraction 


G 


Total 4k(k-1)-1 S¥eti*— 


where the subscript k + 1 is to be replaced by unity when it occurs. 


The variety means are adjusted by using both the inter- and intra- 
block weights. This is accomplished by calculating certain constants 
and subtracting them from the variety averages. If the averages are 
arranged in the order of the x group, then the constants to be sub- 
tracted from row 7 and column ? are as follows: 

W-w’ 

x {(k- 1)(W+ Ww’) (B,,- Ty) (W Ww’) (By 
; where the subscript k +1 is to be taken as unity, and 
W-W’ 
x {(k-1)(W+ W’) (B,,- 74) (W- W’) (Bog) } 
" where the subscript zero is to be taken as k. 
F The standard error of the adjusted varietal means. The standard 
error of the difference between the means of two varieties occurring 
together in the same block is 
(k-1)(W-W’)(W+W’) 
2W [kW + 
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For two varieties not occurring together in the same block the 
standard error of the difference between the means is 
(W - W’)[(2k-1)W+ (2k-3)W’] | 
2W [kW + 
The average standard error of all varietal comparisons is 
1 Ww-w’ 
(k-2) (W+W’)+(k?-3k+8) [ | 
[kW + 

The efficiency relative to randomized complete block design for 
rectangular and square lattices is given in the table below for different 
values of W/W’. The upper figures give the percentage efficiency for 
rectangular lattices with the number of varieties being consecutive 
integers, and the lower figures give the same for square lattices. The 
size of the lattice and whether it is a square or rectangular lattice is 
given in the left hand column of the table. 


TABLE IV 


PERCENTAGE EFFICIENCIES OF DESIGNS OF RECTANGULAR LATTICES AND 
Square LATTICES RELATIVE TO RANDOMIZED COMPLETE BLOCKS 


w/w’ 

1 2 3 4 6 8 10 

5x4 100 106 116 128 154 181 208 
5x5 100 105 114 125 148 172 196 
6x5 100 105 114 124 147 171 195 
6x6 100 104 112 122 142 164 © 185 
7x6 100 104 112 121 142 163 184 
ie 100 104 111 120 138 157 176 
8x7 100 104 ph 119 137 156 176 
8x8 100 103 110 118 134 152 169 
9x8 100 103 110 117 134 151 169 
9x9 100 103 109 116 131 147 163 
10 x9 100 103 109 116 131 147 163 


10 x 10 100 103 108 115 129 143 158 
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QUERIES 
(39) 
QUERY: I have been informed that there is a formula, 


to find the number of samples necessary to take within a given per- 
centage of error. 

It happens that I am interested in a problem of this kind and I was 
using the formula 

N = 2t?s?/d?, 
(Soil Sci., Vol. 58 :275-288, 1944) which gives results twice as high as 
the former. Will not the latter formula give a better approach to the 
problem? 
ANSWER: The two formulas, giving answers to two different ques- 
tions, cannot be used alternatively. Let us consider them separately. 

First. A sample of m observations provides a value of s which 
indicates unsatisfactory reliability. How large a sample is required, 
drawn from the same population, to estimate the population mean with 
fiducial limits no greater than z+d where m-—x=d? The first 
formula is a rough approximation to this sample size. 

Second. Two samples are drawn, each consisting of n observations. 
The pooled standard deviation, s, and the difference between the means, 
d, do not indicate significance. How large should these samples be to 
show significance, assuming the population difference to be not less 
than d? The second formula, quoted by Cline, gives a rough approxi- 
mation to N. 

In each formula the assumption is made that the odds are about 
50-50 that the prospective sample will correctly answer the question 
posed. If you wish to be more confident of the outcome, larger samples 
are required (see query next following). 

Incidentally, if you got one result twice the other, I suspect an 
incorrect substitution of t in the formulas. In the first, ¢ has degrees 
of freedom, n-—1, while in the second, d.f.=2(n-1). 

GEORGE W. SNEDECOR 


(40) 
QUERY : I have a sample of 35 observations with mean, 24, and 
standard deviation, 6. I calculated the half confidence interval, 
St.o5/\/n = 2.061, so that I cannot be reasonably certain of being within 
less than 8.6% of the population mean. What sample size do I need 
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to be assured that my sample mean will be within 1.2 points of the 
population mean, 1.2 being 5% of the present sample mean? 


ANSWER: The formula for the prospective sample size is 

VN = (s/d)t\/F 
where d is the desired half confidence interval, 1.2, ¢ is at any specified 
level (0.05, say) with N —1 degrees of freedom, and F has the degrees 
of freedom, n, = N-1 and n,=35-1=34. F may be set at the tabu- 
lated point which gives you satisfactory assurance that your proposed 
sampling will be successful. 

Since N, ¢ and F all pertain to the sample whose size is to be de- 
termined, solution of the equation must be by successive approxima- 
tion. From the first sample, together with your specification of the 
half interval, s/d =6/1.2=5. For any contemplated sample above 30, 
t can be set at 2. Substituting, 

VN = (5) (2)\/F, or N = 100F 

With this approximate relation, follow the line for n, = 34 in the 
F table until you find N=100F. If you are using Snedecor’s table 
and P=0.05, you will observe that, at n,=100, N =101 is less than 
100F = 164; while at n.=200, N=201 is greater than 100F =161. 
Linear interpolation places N at about 163. This means that if you 
draw a sample of 163 from the same population as before, the proba- 
bility is 0.95 that the mew half interval will be 1.2 or less. As to 
whether this value is more or less than 5% of the new sample mean is 
not specified. } 

In rare instances when one needs improvement of the foregoing 
approximation, two refinements can be introduced. (i) Substitute 
the more accurate value of ¢ for d.f.= 162; that is, to;=1.975. (ii) 
Instead of linear interpolation, use the method described by Fisher in 
section 41 of his ‘‘Statistical Methods;’’ that is, plot F against the 
reciprocal of ,, then interpolate linearly. 

You may not demand such assurance as 19 in 20 that the new 
sample will turn up d=1.2 or less. If you are satisfied with chances 
of 4 in 5, you can use the 20% points of F (variance ratio) in the 
Fisher and Yates table. Other probabilities are tabulated by Mer- 
rington and Thompson in Biometrika, Vol. 33, pp. 73-88 (1943). 

The approximation described for the preceding query is based on 
the value, F = 1, the assumption being that this is the 50% value of F. 
The closeness of the approximation can be assessed by examination of 
the 50% points in Merrington and Thompson’s table. F;=1 only if 
M = 2, but the approximation is generally satisfactory if both n, and 
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M2 are greater than 20. Illustration: approximate N from the data 
given above but with F at its 50% point. A double interpolation is 
required, as in Fisher’s example, the result being N=100. The inter- 
pretation is that,a second sample of 100 observations is as likely as not 
to yield d = 1.2 or less. 

This should be compared with the solution given by the first formula 
in the foregoing query. Since ¢ = 2.032 for d.f. = 34, we have 

N = (2.032)?(6)?/(1.2)? = 103 

The excellence of this approximation is due to: (i) the small discrep- 
ancy between the orignal value of ¢ for 34 d.f., 2.032, and the correct 
value for the new sample, t = 1.984, d.f. = 99; and (ii) the close approach 
of the 50% F to 1, its value in this example being 1.013. 
A. M. Moop anp GEorGE W. SNEDECOR 


(41) 

QUERY: As a ‘‘practical statistician’’ I sometimes wonder about the 
justification of utilizing certain refinements in practical every-day 
statistics. My reaction is that a practical statistician, to be competent, 
should know statistical method; but, to be practical, he should also 
realize that crude methods may often adequately serve immediate 
needs. Too, when errors are so very easy to enter, it sometimes 
seems to me that precise methods are of questionable practical utility. 
Broadly, and specifically, it seems to me that in dealing with any but 
relatively small samples in ‘‘practical’’ fields, utilizations of such de- 
vices as degrees of freedom and other similar adjustments are not gen- 
erally warranted. Perhaps, I might express myself more clearly if I 
were to say that it seems to me that such refinements have definite limi- 
tations and should not be used indiscriminately. I’d appreciate your 
brief comments. 


ANSWER: I can agree to most of your comments—the experienced 
statistician can often anticipate the outcome of an investigation quite 
accurately and thus avoid tedious computation. Or, he may be con- 
fident that a simple method will extract all the information necessary 
for practical purposes, thus avoiding elaborate processes. A colleague 
of mine refers to this as ‘‘zoot’’ statistics, a kind used freely by com- 
petent investigators. 

Some research men tolerate rough treatment of their data while a 
few insist upon it; but my experience has been that most of them are 
satisfied with nothing less than complete extraction of available infor- 
mation. Can one blame them? After they have labored long over 
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exacting techniques for incorporating information in their data, there 
is every reason for using the most efficient statistical methods for elicit- 
ing the last ounce of it. Usually the statistical work, even the most 
complicated, is trivial compared to the time, energy, and money that 
have gone into the experimental work. - 

Doesn’t it all boil down to this: how much is the information worth, 
in dollars and cents and human energy? Certainly one should dis- 
courage the expenditure of money or effort if there is clearly little 
information in the data, or if the results can obviously have little value. 
But data which are packed with expensive and useful information 
warrant the utmost refinement in statistical methods of extraction. 

GrEorGE W. SNEDECOR 


(42) 

QUERY : In certain types of investigation it is possible to make rather 
accurate observational classifications, but satisfactory physical meas- 
urements have not been developed. Numerical values may be assigned 
to these classes, although because of lack of physical-measurement 
standards such values may bear no exact proportional relation to the 
characters observed. The distribution of the values may be approxi- 
mately normal. 

Is it permissible to treat such data by analysis of variance (1) when 
the distribution is normal, and (2) when the distribution deviates from 
normal ? 


ANSWER: Several distinctions should be made before any answer has 
meaning. 

First. Analysis of variance is merely an arithmetical process of 
allocating variance to observed classes and, as such, is applicable to 
any numerical data. Contrariwise, probability statements, such as 
those involved in tests of significance, are affected by the distribution 
of the measured variate. I assume that the quer'y is about the accuracy 
of the probability turned up in an F test of significance. 

Second. Observational classifications are opinions of the observer 
based on physical characteristics of the material being judged. Pre- 
sumably querist is asking whether the test of significance will lead to 
correct conclusions about the material, irrespective of the observer. 
This involves the appropriate design of the sampling: it must include 
independent observations by at least two persons who have the ability 
and the training required to align their opinions with facts. An allow- 
ance for variation among opinions must be made available by the 
analysis of variance and must be included in the estimate of error. 
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Third. The assignment of numbers to the observational classes is 
understood to be governed by the physical characteristics of the ob- 
served material ; otherwise, conclusions based on such numbers may be 
unrelated to the phenomenon being investigated. Any statistical pro- 
cedures, such as averages, analysis of variance and related tests of 
significance, are futile if the numbers used do not measure the variate 
being studied. In certain large fields of investigation, notably in 
mental measurements, there are reasons to believe that the numbers 
should be assigned so that the resulting distributions are normal ; but 
this, while statistically‘convenient, is clearly not universal. 

With these distinctions in view, my answer to the first question is 
‘‘yes.’? The second question introduces difficulties in statistics, some 
of which were discussed in Query No. 32, Vol. 2, No. 4, p. 73-74 of this 
Bulletin. If anormality is extreme, the probability associated with 
the test of significance may be seriously affected: the advice of a 
mathematical statistican should be sought. 

GEorceE W. SNEDECOR 
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NEWS AND NOTES 


Eight new members have been added to the staff of the Statistical 

Laboratory, Iowa State College. They are: GEORGE W. BROWN 

from the R.C.A. laboratories; LEONID HURWICZ recently research 

associate Cowles Commission, Chicago; ALEXANDER M. MOOD, 

who was research associate at Princeton; HERMAN O. DRABEN- 

STOTT, CLIFFORD J. MALONEY and NORMAN V. STRAND who 

have had former associations with this laboratory; JAMES G. DAR- 

ROCK formerly agricultural scientist, Dominion Laboratory of Cereal 

Breeding, Winnipeg ; and GARNET E. McCREARY who received his 
M.A. from Queen’s in 1946. The new bi-monthly publication, ‘‘Stat- 

lab Review,’’ by the Iowa State College Statistical Laboratory, has as 
its editor, JOSEPH C. DODSON. This paper gives a brief report of 
the two ‘‘ Allied Missions to Observe the Greek Elections.’’ . . . After 
returning from two years of duty as an Operations Analyst with the 
Army Air Forces in the European Theater, A. E. BRANDT returned 
to the Soil Conservation Service as Research Specialist in charge of 
experimental design and analysis of data. On October 1, he reported 
to the Naval Ordnance Laboratory as Statistical Consultant on the 
staff of the Technical Director of the laboratory. MRS. BRANDT 
was recently thrown from one of their horses and broke her hip. Our 
sympathy and may the recovery be speedy... . T. A. BANCROFT 
is now on the staff of the Department of Mathematics, The University 
of Georgia, Athens. He writes, ‘‘I am organizing courses in statistics 
here and starting consulting work for various research workers. The 
University of Georgia Science Club held a symposium on ‘The con- 
tribution of statistical methods to research’ on Tuesday, November 26. 
In the evening at eight o’clock, W. G. COCHRAN delivered the prin- 
cipal address. His address was preceded by a dinner in his honor 
given by members of the Science Club. In the afternoon at four 
o’clock a number of short addresses were given on the necessity for 
statistical methods in research in the various fields of applied science. 
These talks were given by B. O. WILLIAMS, A. 8. EDWARDS, W. 
T. HICKS, EDWIN JAMES and CHARLES C. WILSON. I spoke 


on ‘Statistics as a mathematical subject.’’”’ ... WARREN H. 
LEONARD has returned to Colorado A & M College, Fort Collins, 
after an absence of four years in the army. . . . CHARLES P. 


WINSOR, Department of Biostatistics, Johns Hopkins University, 
who resides at 615 North Wolfe Street, Baltimore, Maryland, is editor 
of Human Biology. He would particularly like to get papers which 
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are of interest from the point of view of quantitative methodology. 


“The editor of this Bulletin extends condolences. We need articles 


too! .. . The Columbia Broadcasting System has started a series of 
radio discussions on ‘‘You and Alcohol’’ with initial discussion by 
E. M. JELLINEK, biometrician and director of the Section on Alcohol 
Studies of the Laboratory of Applied Physiology, Yale. ... ALFRED 
SAUVY, Director, National Institute of Demographic Studies, 20, Rue 
de la Baume, Paris, with his wife and PAUL E. VINCENT visited the 
Institute of Statistics-at Raleigh November 5. They were interested 
in seeing the cotton-growing experiments and in hearing and watching 
a tobacco auction. .. . 

At the Annual Meeting of the Animal Vitamin Research Council 
held in Washington October 17, 1946, it was voted to change the name 
of the organization to the Animal Nutrition Research Council. This 
action was taken in conformity with the expanded objectives of the 
Council which will include the study of animal nutrients other than 
vitamins. Plans were discussed for the collaborative investigation of 
the toe-ash procedure in the A.O.A.C. chick assay for vitamin D. This 
procedure, if officially adopted, will substantially reduce the time 
necessary for the completion of vitamin D assays which now entail 
the use of solvent-extracted dried tibiae. Conferences were held on 
various other research projects relating to the biological assay of 
animal nutrients to be carried out under the sponsorship of the Animal 
Nutrition Research Council. Officers elected for the new term are: 
Chairman, KENNETH MORGAREIDGE, Director of Research and 
Control Laboratories, Vitamin Division, National Oil Products Co., 
Harrison N. J.; Secretary, FULLER D. BAIRD, Standard Brands, 
New York City; Treasurer, GEORGE H. KENNEDY, E. I. du Pont 
de Nemours, New Brunswick. In addition to the above, new members 
of the Executive Committee include C. I. BLISS, HERBERT C. 
SCHAEFER, R. V. BOUCHER, and H. R. HALLORAN.... 
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A NOTE FROM THE EDITORIAL COMMITTEE 


This issue of the Biometrics Bulletin completes the second year of 
publication. There are 1150 members and 150 subscribers. This in- 
dicates that an active interest exists in statistical methods for biological 
research workers. - 

Your opinions have been difficult to secure, because the letters to 
the editor have been too few. Those letters containing criticisms which 
were received have been helpful. On the basis of these criticisms, the 
Committee decided to change the format of the Bulletin. Several 
changes will be noted in this issue. The articles are printed in one 
instead of two columns, thus enabling the tables and equations to be 
presented more satisfactorily. The type size has been increased along 
with wider margins. ‘ An effort will be made to provide a better ap- 
pearance and easier reading. The self cover will be continued for the 
time being. If the Bulletin is expanded into a journal, a few other 
format changes might be advisable. 

The Editorial Committee has decided to shift to a quarterly pub- 
lication beginning with Vol. 3. The total number of pages currently 
used for six issues will be maintained or increased. Voluntary services 
of the Committee have been taxed heavily with publication deadlines 
every other month. 

Consideration is being given to the feasibility of establishing a 
Biometrics Journal. When the backlog and flow of articles is suffi- 
cient to maintain a journal, this question will be given consideration. 
Meanwhile expressions on the need for a journal on statistical meth- 
odology for research workers in biology would be helpful. What do 
you want? 

Please continue to send queries to Professor G. W. Snedecor and 
news items to the Chairman of the Editorial Committee. 

We wish to take this means of expressing our appreciation for your 
cooperation. 

GERTRUDE M. Cox, Chairman 
Editorial Committee 
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STATEMENT OF THE OWNERSHIP, MANAGEMENT, CIRCULATION, ETC., RE- 
UIRED BY THE ACTS OF CONGRESS OF AUGUST 24, 1912, AND MARCH 3, 1933, 
F Biometrics Bulletin, published bimonthly at Washington, D. C., for 12 months ending 

October, 1946. 

Washington, D. C., ss. Before me, a Notary Public in and for the State and county 
aforesaid, personally appeared Lester S. Kellogg, who, having been duly sworn ee 
to law, deposes and says that he is the managing editor of the Biometrics Bulletin an 
that the following is, to the best of his knowledge and belief, a true statement of the 
ownership, management (and if a daily paper, the circulation), ete., of the aforesaid pub- 
lication for the date shown in the above caption, required by the Act of August 24, 1912, 
as amended by the Act of March 3, 1933, embodied in section 537, Postal Laws and Regu- 
lations, printed on the reverse of this form, to wit: 

1. That the names and addresses of the publisher, editor, managing editor, and busi- 
ness managers are: Publisher, American Statistical Association, 1603 K Street. N a 
Washington 6, D. C.; Editor, Gertrude M. Cox, Institute Of Statistics, Raleigh, N. C.; 
Managing Editor, Lester S. Kellogg, 1603 K Street, N. W., Washington 6, D. C.; Business 
Managers, None. 

2. That the owner is: (If owned by a corporation, its name and address must be 
stated and also immediately thereunder the names and addresses of stockholders owning 
or holding one per cent or more of total amount of stock. If not owned by a corporation, 
the names and addresses of the individual owners must be given. If owned by a firm, 
company, or other unincorporated concern, its name and address, as well as those of each 
individual member, must be given.) American Statistical Association, 1603 K Street, 
N. W., Washington 6, D. C. 

3. That the known bondholders, mortgagees, and other security holders owning or 
holding 1 per cent or more of total amount of bonds, mortgages, or other securities are: 
(If there are none, so state.) None. 

4. That the two paragraphs next above giving the names of the owners, stockholders, 
and security holders, if any, contain not only the list of stockholders and security holders 
‘as they appear upon the books of the company but also, in cases where the stockholder or 
security holder appears upon the books of the company as trustee or in any other fiduciary 
relation, the name of the person or corporation for whom such trustee is acting, is given ; 
also that the said two paragraphs contain statements embracing affiant’s full knowledge 
and belief as to the circumstances and conditions under which stockholders and security 
holders who do not appear upon the books of ihe company as trustees, hold stock and 
securities in a capacity other than that of a bona fide owner; and this affiant has no 
reason to believe that any other person, association, or corporation has any interest direct 
or indirect in the said stock, bonds, or other securities than as so stated by him. 

5. That the average number of copies of each issue of this publication sold or 
distributed, through the mails or otherwise, to paid subscribers during the twelve months 
preceding the date shown above is (........). (This information is required from daily 
publications only.) 


LESTER S. KELLOGG, Mg. Ed. 
Sworn to and subscribed before me this 8th day of October, 1946. 
(My commission expires July 31, 1949.) 


BEVERLY RUTH RIsToN, Notary Public. 
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Officers of the American Statistical Association: President, Isador Lubin; Di- 
rectors, Chester I. Bliss, E. Grosvenor Plowman, Walter A. Shewhart, Samuel A. 
Stouffer, Willard L. Thorp, Helen M. Walker; Vice-Presidents, F. L. Carmichael, 
8. 8. Wilks, Dorothy Swaine Thomas; Secretary-Treasurer, Lester S. Kellogg. 

Officers of the Biometrics Section: Chairman, D. B. DeLury; Secretary, H. W. 
Norton; Section Committee members: E. J. deBeer, A. E. Brandt, J. W. Fertig, 
J. G. Osborne, J. W. Tukey. 

Editorial Committee for the Biometrics Bulletin: Chairman, Gertrude Cox; 
members, R. L. Anderson, C. I. Bliss, W. G. Cochran, Churchill Eisenhart, H. W. 
Norton, G. W. Snedecor, C. P. Winsor. 

Material for the BULLETIN should be addressed to the Chairman of the Edi- 
torial Committee, Institute of Statistics, North Carolina State College, Raleigh, 
N. C., material for Queries should go to ‘‘ Queries,’’ Statistical Laboratory, Iowa 
State College, Ames, Iowa, or to any member of the committee. 
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